feature: improve provenance and make q2-preview editable#231
Draft
gordonwoodhull wants to merge 104 commits into
Draft
feature: improve provenance and make q2-preview editable#231gordonwoodhull wants to merge 104 commits into
gordonwoodhull wants to merge 104 commits into
Conversation
gordonwoodhull
added a commit
that referenced
this pull request
May 25, 2026
The hub-client-e2e.yml `paths:` filter only fires the workflow when a commit touches `hub-client/**` or the workflow file itself. It does not follow transitive Rust deps, so PRs that modify upstream crates the WASM bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`, `pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently skip e2e. Two recent misses: - f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and `pollster::block_on` introduced in `quarto-core` broke 8 hub-client WASM tests on main. e2e never ran because the change was under `crates/`, not `hub-client/`. - PR #231 (feature/provenance, this branch): 57 files modified across `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently skipped on every push despite the PR materially changing the WASM bundle's behavior. Fix: drop the `paths:` filter outright and match the trigger shape of the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`). Also adds a `concurrency:` block (lifted from `test-suite.yml`) so superseded runs on a PR get cancelled in flight — keeps the runner cost from compounding. Closes bd-izh3. The original ask there was to add a PR trigger with a *broader* path filter; that approach still wouldn't catch the upstream- crate case, so we go the coarser route the issue's spirit calls for. The runner-sizing open question in bd-izh3 is also resolved — ae8274a confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the full suite in 5.3-8.1 min. `kyoto` deliberately omitted from the branch list: `origin/kyoto` last moved 2026-02-02 and is 825 commits behind main; the sibling workflows still reference it but that's cargo-cult.
gordonwoodhull
added a commit
that referenced
this pull request
May 25, 2026
…nce) bd-izh3 closed by 016894a on feature/provenance (PR #231). The patch drops the hub-client-e2e.yml path filter outright so the workflow fires on every PR like the sibling heavy workflows — strictly broader than the original 'add PR trigger with broader filter' proposal, since path filters can never follow transitive Rust deps. Incidental: bd-cxara has its 'source_repo_path' field stripped (was a stale absolute path from shikokuchuo's local clone; harmless flush).
gordonwoodhull
added a commit
that referenced
this pull request
May 25, 2026
The hub-client-e2e.yml `paths:` filter only fires the workflow when a commit touches `hub-client/**` or the workflow file itself. It does not follow transitive Rust deps, so PRs that modify upstream crates the WASM bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`, `pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently skip e2e. Two recent misses: - f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and `pollster::block_on` introduced in `quarto-core` broke 8 hub-client WASM tests on main. e2e never ran because the change was under `crates/`, not `hub-client/`. - PR #231 (feature/provenance, this branch): 57 files modified across `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently skipped on every push despite the PR materially changing the WASM bundle's behavior. Fix: drop the `paths:` filter outright and match the trigger shape of the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`). Also adds a `concurrency:` block (lifted from `test-suite.yml`) so superseded runs on a PR get cancelled in flight — keeps the runner cost from compounding. Closes bd-izh3. The original ask there was to add a PR trigger with a *broader* path filter; that approach still wouldn't catch the upstream- crate case, so we go the coarser route the issue's spirit calls for. The runner-sizing open question in bd-izh3 is also resolved — ae8274a confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the full suite in 5.3-8.1 min. `kyoto` deliberately omitted from the branch list: `origin/kyoto` last moved 2026-02-02 and is 825 commits behind main; the sibling workflows still reference it but that's cargo-cult.
318ab48 to
4ee51e4
Compare
4 tasks
gordonwoodhull
added a commit
that referenced
this pull request
Jun 1, 2026
The hub-client-e2e.yml `paths:` filter only fires the workflow when a commit touches `hub-client/**` or the workflow file itself. It does not follow transitive Rust deps, so PRs that modify upstream crates the WASM bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`, `pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently skip e2e. Two recent misses: - f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and `pollster::block_on` introduced in `quarto-core` broke 8 hub-client WASM tests on main. e2e never ran because the change was under `crates/`, not `hub-client/`. - PR #231 (feature/provenance, this branch): 57 files modified across `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently skipped on every push despite the PR materially changing the WASM bundle's behavior. Fix: drop the `paths:` filter outright and match the trigger shape of the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`). Also adds a `concurrency:` block (lifted from `test-suite.yml`) so superseded runs on a PR get cancelled in flight — keeps the runner cost from compounding. Closes bd-izh3. The original ask there was to add a PR trigger with a *broader* path filter; that approach still wouldn't catch the upstream- crate case, so we go the coarser route the issue's spirit calls for. The runner-sizing open question in bd-izh3 is also resolved — ae8274a confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the full suite in 5.3-8.1 min. `kyoto` deliberately omitted from the branch list: `origin/kyoto` last moved 2026-02-02 and is 825 commits behind main; the sibling workflows still reference it but that's cargo-cult.
760dacd to
c9dcc6f
Compare
Audit and revise Plans 3-8 of the q2-preview series (now framed
internally as the provenance epic) after a design discussion that
followed the q2-preview pipeline and attribution work landing on main.
Major design changes folded into the plans:
- **Plan 4 unified Generated variant.** Collapse the earlier
`Synthetic` + `Derived` split into one `Generated { by, anchors: Vec<Anchor> }`
shape. Atomicity is per-`by.kind` (orthogonal to anchors); the
invocation source byte range is the first anchor with role
`AnchorRole::Invocation`. One wire-format code (4) instead of two.
- **Plan 4/5/6 typed anchors (Path C).** Instead of stuffing
source-info chain metadata into `by.data` (dynamic JSON), the chain
is a typed `Vec<Anchor>` where each `Anchor` carries an `Arc<SourceInfo>`
and a role-labeled `AnchorRole` (`Invocation`, `ValueSource`,
`Other(String)`). `by.data` shrinks to per-kind non-source-info
configuration. Two future-anchor roles flagged as follow-ups
contingent on metadata-loader and Lua-file-registration work.
- **Plan 6 uniform shortcode anchor stamping.** Single funnel covers
Rust built-ins, Lua-loaded extension handlers, and user-extension
shortcodes uniformly via a post-walk `stamp_shortcode_anchors` helper.
Enrichment-via-post-walk preserves Lua-attached `by.data` fields
(lua_path, lua_line) while promoting `by.kind` to `shortcode`.
Attribution interaction documented: multi-author shortcodes get
latest-wins via the existing `query_byte_range` max-time logic
composed with chain-walking through the `Invocation` anchor.
- **Plan 5 latent code-3 bug now reachable.** Plans 1-2 shipped the
q2-preview pipeline that runs filters whose output crosses the JSON
boundary; the FilterProvenance code-3 round-trip bug is no longer
latent in production. Added end-to-end production-reachability
regression test using the `{{< kbd Ctrl+C >}}` fixture (kbd.lua
constructs a Span that gets FilterProvenance-tagged and then
shortcode-stamped). Drops code 5 from the design.
- **Plan 7 SPA edit-back in scope.** The new q2 preview CLI command
serves a separate SPA from ts-packages/preview-renderer; both
hub-client and the SPA share the writer machinery via @quarto/preview-runtime.
Plan 7 now covers replacing `noopSetAst` in the SPA with a real
handler that routes through `incrementalWriteQmd` to
`syncClient.updateFileContent` and the ephemeral hub's automerge↔disk
bridge. Adds a small SPA-local `DiagnosticStrip` for Q-3-42/Q-3-43;
hub-client's existing diagnostics-banner handles the same warnings
there. Single-file mode (bd-tnm3k) works through the same automerge
stack — no special case.
- **Plan 8 wrapper stays Original.** Explicit reasoning added for
why `CustomNode("IncludeExpansion")` uses Original source_info
(CustomNode.type_name carries generator identity; the wrapper
substitutes 1:1 for the source-mapped Paragraph). HTML pipeline
resolve transform in the Normalization Phase (symmetric with
CalloutResolveTransform); HTML doesn't attribute the include line
because there's no DOM anchor for it — accepted v1 behavior.
Mechanical changes also folded in:
- Rename `Synthetic` → `Generated` throughout the type vocabulary in
all plans.
- Update JS-side hand-mirror file paths (`hub-client/src/utils/...`
→ `ts-packages/preview-renderer/src/utils/...`) to reflect the
Phase-D package split.
- Each plan's intro reframed as part of the provenance epic; file
names keep the q2-preview-plan-N form for continuity.
File renames for clarity about which filters each plan covers:
- `…plan-3-filter-idempotence.md` → `…plan-3-builtin-filter-idempotence.md`
- `…plan-7a-filter-idempotence.md` → `…plan-7a-user-filter-idempotence.md`
Plans 3-8 remain in design state on this branch; no code changes yet.
Audit pass over the provenance epic's idempotence story, scoping Plan 3 to pipeline non-determinism only and propagating the consequences to the neighbouring plans. Plan 3 (builtin transform and filter idempotence): - Retitle to "Built-in transform and filter idempotence verification" — symmetric across Rust transforms and Lua filters (prior framing was too narrow). - Enumerate the actual universe under test: 36 Rust transforms in build_q2_preview_transform_pipeline (4 excluded, named with reasons), ~20 stage-level items in build_q2_preview_pipeline_stages, and the one Lua filter under resources/extensions/ (video-filter.lua). The prior "~10-20 filters" estimate misread shortcodes as filters. - Drop the "Plan 3 strengthening" round-trip amendment that was added alongside Plan 7a in commit 2129d35. Round-trip non-idempotence is not exercised by today's pipeline; CI-time round-trip testing conflates writer-lossiness with filter-non-idempotence; 7a's runtime check is the better home for the property when Plan 7's writer ships. Trim "Two flavors" section to a pointer at 7a. - Add compute_meta_hash_fresh / compute_meta_hash_fresh_excluding_rendered as a new helper in quarto-ast-reconcile, parallel to the existing block hasher. Hash covers blocks + meta (excluding rendered.*). - Rewrite test pseudocode against the real run_pipeline API at pipeline.rs:626. - Add fixture-format constraint: no executable engine cells (CI has no kernels). - Coverage gap audit: ~25 fixtures across the document-level, Lua shortcode, website-project, attribution, and resource categories. Includes lua-shortcode-version, lua-shortcode-lipsum-fixed (non-random path), and video-filter-header for the one built-in Lua filter. - Convert to a development-plan format with a seven-phase work-items checklist. - Close the engine-staleness open question via filter.rs:158 (fresh Lua::new() per invocation). - Clarify the lua-filter-pipeline reference as TypeScript Quarto porting material, not the Rust inventory. Plan 6 (provenance audit): - Add a §Test plan bullet for source_info determinism: Plan 3's hashes exclude source_info by design, so a per-fixture source_info-equality check is Plan 6's own responsibility. Plan 7 (incremental writer): - Add a writer-lossless baseline test as the first §Test plan bullet, prerequisite for the reconciler tests. Reuses Plan 3's fixture set. - Add Plan 3 to §References and §Dependencies (soft-depends-on via compute_meta_hash_fresh). Plan 7a (runtime user-filter idempotence): - Remove all references to the now-deleted "Plan 3 strengthening" section (five locations including a full subsection). - Reframe the out-of-scope bullet from "Strengthening Plan 3" to "Extending the runtime round-trip check to built-in filters," with three-point v1-acceptance reasoning in §Notes. - Update §Design decisions, §Dependencies, and §References to reflect the new shape and the shared compute_meta_hash_fresh helper. - Add the meta-hash comparison to step 4 of the round-trip check. No code changes; design state only.
…ailure policy
Hash helper: `merge_op` participates (verified `MergeOp::default() =
Concat` is a stable compile-time constant); `Map` entries hashed in
insertion order, no sort (an idempotence test should *catch* the kind
of HashMap-iteration-order non-determinism a sort would mask). Adds
regression-guard unit tests for both choices.
Test runner: drives every fixture through both `DriveMode::SingleFile`
(direct `run_pipeline`) and `DriveMode::ProjectOrchestrator`
(`ProjectPipeline<RenderToPreviewAstRenderer>`) so orchestrator-only
non-determinism (project discovery, ProjectIndex assembly, file-iteration
order) is also under test. Website/chrome fixtures are
orchestrator-only by design.
Failure policy: failing fixtures stay **failing** — no auto-`#[ignore]`.
Each failure files a beads issue whose description doubles as a
sub-agent investigation prompt. The integration branch holds the
queue; merge to main waits until drained or the user explicitly opts
to ignore.
New helper `find_first_divergence` (alongside the hashers) returns
`DivergencePoint::{Block { index }, MetaKey { path }, None}` so the
test driver's panic message — and therefore the sub-agent prompt —
arrives with a concrete starting point instead of just "hash diverged."
Orchestrator-mode `DocumentAst` extraction: researched the data flow;
the typed AST is materialized inside `render_qmd_to_preview_ast` but
discarded after JSON serialization. Plan recommends adding `pub ast:
DocumentAst` to `PreviewAstOutput` and forwarding through
`WasmPassTwoOutput`; alternatives (JSON re-parse, test-only hook)
documented with their costs.
Fixture rules: no absolute process paths in fixture content (built-in
extensions extract to a `temp_dir` whose path differs across CI runs;
stable within a single process — fine for two-runs-compare, but a
latent issue for future stored-snapshot variants).
Smaller corrections: `Format::from_format_string("q2-preview")` (no
`Format::q2_preview()` constructor exists); `apply_lua_filter`
(singular) is the per-filter Lua-state-creation site, with the plural
loop calling it once per filter; `LuaShortcodeEngine::new` is the
shortcode-side analogue; `quarto/video` filter extension is built-in
via `include_dir!(resources/extensions)` and auto-discovered by
`StageContext::new`, so fixtures need no scaffolding beyond `filters:
[video]` in YAML; `meta.rendered.includes.*` is the actual path
(not `meta.includes.*`) and includes contributions from
`IncludeResolveStage`, chrome render transforms, `attribution_viewer`,
and Bootstrap/clipboard injection — all skipped by
`compute_meta_hash_fresh_excluding_rendered`.
Stage-inventory clarifications: `MathJsStage` is excluded from
q2-preview; `BootstrapJsStage` and `ClipboardJsStage` write only to
`ctx.artifacts` (not to `meta` or `blocks`), so they don't affect the
hash — but their q2-preview inclusion is questionable and is filed
separately as bd-2ag1c.
Notes for the next traversal: `CodeHighlightStage`'s native disk scan
for user grammars is OS-order-dependent (not exercised today;
fixtures don't supply user grammars); lipsum's module-load
`math.randomseed(os.time())` is harmless on the non-random code path
the fixture exercises but should be reverified if a future variant
routes through `math.random`.
Estimated scope: ~760 → ~980 lines.
…branch policy
Audit pass against current source. Settles every open question that
remained in the prior revision and corrects factual drift.
Reuse over rebuild
- `DriveMode::ProjectOrchestrator` now delegates to the existing
`render_active_page_preview` helper at
`crates/quarto-core/tests/render_page_in_project.rs:660`. No fresh
orchestrator wiring; no `make_website_project_ctx(...)` builder.
- `DocumentAst` extraction settled on option (a): re-parse the JSON
via `pampa::readers::json::read`. source_info round-trips but the
hash excludes it, so no stripping pass and no production plumbing
change is required. Earlier option (b) (typed-AST plumbing through
`PreviewAstOutput` / `WasmPassTwoOutput`) abandoned.
- `run_orchestrator` code sample updated: real body in place of the
prior `unimplemented!("see Open questions")` stub.
Test crate location pinned
- File: `crates/quarto-core/tests/idempotence.rs`.
- Fixtures: `crates/quarto-core/tests/fixtures/idempotence/`.
- Cargo invocation in the sub-agent prompt template updated to
`--test idempotence`.
Long-lived branch policy made explicit
- New `## Long-lived branch policy` section at the top.
- `## Goal` clarifies that "CI-enforced" applies when the plan lands
on `main`; until then `feature/provenance` is allowed to be red
while the failure queue drains.
- `### Phase 5 — Failure triage` opens with the same constraint.
Factual fixes against current source
- Transform count corrected from 36 to 37; missing
`table-bootstrap-class` added to Finalization, with a fixture
entry in the gap audit and Phase 4 checklist.
- `Q2_PREVIEW_STAGE_EXCLUDED` corrected to list all three exclusions
(`math-js`, `render-html-body`, `apply-template`).
- `CodeHighlightStage` user-grammar scan citation moved from
`pipeline.rs:644-650` to
`crates/quarto-core/src/transforms/code_highlight.rs:126-129`.
- Stale line numbers refreshed throughout (pipeline.rs 1181→1198,
1220→1237, 379→380, 355→356, 626→627, 855→859, 663→664;
render_page_in_project.rs 653→660; Pass2Payload::AstJson 256→254;
stage/context.rs 220→221; ShortcodeResolveTransform::transform
257→513 with the correct file path).
- bd-2ag1c ordering pinned: Plan 3 lands first; bd-2ag1c follows
with Plan 3's measurements in hand.
Section rename: "Open questions for implementation" →
"Decisions (was: open questions)" + a `### CI failure policy &
sub-agent prompt template` subsection. All internal cross-refs
updated.
Estimate revised
- Scaffolding line item: ~260 → ~100 lines (reuse, not rebuild).
- `PreviewAstOutput::ast` plumbing (~20 lines) removed entirely.
- Total: ~980 → ~800 lines.
- Session count revised 2 → 2-3 with the third explicitly allocated
to Phase 5 triage.
Adds the structural-hash infrastructure that Plan 3's q2-preview idempotence gate (and Plan 7a's runtime user-filter check) will sit on: - compute_meta_hash_fresh: source-info-agnostic ConfigValue hasher. Insertion-order Map keys (no sort, so HashMap-iteration-order bugs in transforms remain detectable). MergeOp participates via its enum discriminant. Recurses into PandocInlines/PandocBlocks via the existing inline/block hashers (which already exclude source_info). - compute_meta_hash_fresh_excluding_rendered: same, but skips the top-level `rendered` map entry. The exclusion is intentionally not propagated into recursion: a nested `rendered` key is content. - find_first_divergence + DivergencePoint: returns the first block index whose per-block fresh hash differs, or the first insertion- order meta key path whose subtree hash differs (with the same rendered.* exclusion). The plan-sketch signature took &DocumentAst, but quarto-ast-reconcile cannot depend on quarto-core; the helper takes &[Block] + &ConfigValue and the test driver projects from DocumentAst. - 11 new unit tests cover: same/different content, source_info/ key_source agnosticism, top-level rendered exclusion, nested rendered participation, Map insertion-order sensitivity (no-sort regression guard), MergeOp sensitivity; identical/Block-mismatch/ MetaKey-path/rendered-skip divergence localization. Verification: `cargo nextest run --workspace` — 9321 passed, 196 skipped. `cargo xtask verify --skip-hub-build` steps 1–5 green (lint, fmt, Rust build with -D warnings, tree-sitter, Rust tests with -D warnings). Steps 7/10 fail with the known --skip-hub-build artifact (`wasm-quarto-hub-client` unbuilt), unrelated to these additive Rust changes. Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
Adds the test driver that Phases 3-4 will hang ~25 fixtures off.
Self-contained at `crates/quarto-core/tests/idempotence.rs`.
- `DriveMode { SingleFile, ProjectOrchestrator }`. Single-file calls
`run_pipeline` with `build_q2_preview_pipeline_stages`. Orchestrator
drives `ProjectPipeline<RenderToPreviewAstRenderer>` via the existing
`render_active_page_preview` body (copied inline because each
`tests/*.rs` is its own binary).
- `Fixture { name, setup, active, modes }` + `run_fixture` runs the
pipeline twice per (fixture, mode), hashes blocks via
`compute_blocks_hash_fresh` and meta via
`compute_meta_hash_fresh_excluding_rendered`, and on divergence
panics with `find_first_divergence`'s `DivergencePoint` embedded so
the panic message itself fills the plan's sub-agent investigation
prompt template.
- `pandoc_to_document_ast` is the small field-shuffle that the plan
identifies: orchestrator mode emits `Pass2Payload::AstJson`, which
`pampa::readers::json::read` re-parses into `(Pandoc, ASTContext)`;
the hasher only reads `ast.blocks` + `ast.meta` so the other
`DocumentAst` fields get defaults.
- `tests/fixtures/idempotence/README.md` documents the fixture-format
rules (no engine cells, no absolute paths, per-fixture mode mapping).
- `smoke_plain_paragraph` smoke fixture drives a single-paragraph
document through both modes. Passing this proves the harness works
end-to-end before Phases 3-4 land the real fixtures.
Verification: `cargo nextest run -p quarto-core --test idempotence`
runs the new smoke test (PASS). `cargo xtask verify
--skip-hub-build --skip-hub-tests` steps 1-9 green; the Phase-1
idempotence tests and this Phase-2 smoke test ran inside Step 5.
Step 10 (preview-renderer integration tests in
`ts-packages/preview-renderer/`) fails with the same WASM-import
artifact as Step 7 — both depend on `wasm-quarto-hub-client` which
`--skip-hub-build` skips. Unrelated to these Rust-only additions.
Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
Adds the existing-fixture batch the plan calls "carry-forward from prior plan draft": one fixture per Rust transform / feature that was already exercised in earlier idempotence drafts, scoped to single-file document fixtures that run in both DriveMode variants. Coverage: - meta-single, meta-markdown — shortcode-resolve + metadata-normalize (string and PandocInlines branches). - include-trivial — include-expansion stage + shortcode-resolve. - callout-warning — CalloutTransform (callout-resolve is excluded from q2-preview, so the CustomNode survives). - theorem — TheoremSugarTransform. - figure-ref-target — FloatRefTargetSugarTransform. - crossref-to-theorem — crossref-index + crossref-resolve. - sectionize-multi — SectionizeTransform across nested headers. - footnotes-mixed — FootnotesTransform on inline + reference forms. - appendix-license — AppendixStructureTransform with license/ copyright meta and a footnote interaction. - combined-stress — sectionize + callouts + shortcodes interacting. A `doc_fixture(name, content)` helper collapses each single-file fixture to a one-liner; `include-trivial` keeps an inline closure because it writes two files. All 12 idempotence tests (smoke + 11 new) pass: `cargo nextest run -p quarto-core --test idempotence` → 12 passed. No queue entries for Phase 5 from this batch — the carry-forward fixtures are all clean on first run. Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
npm install (from repo root) and npm run build:wasm (from hub-client) updated package-lock.json and crates/wasm-quarto-hub-client/Cargo.lock on this branch. Committed so subsequent fresh checkouts of feature/provenance can build WASM from the same dependency set.
Adds the batch of Phase-4 fixtures that need no scaffolding beyond a single-file `setup`. Per the long-lived-integration-branch policy, fixtures that surface non-idempotence stay in the suite as the triage queue. Pass on first run (both DriveModes): - code-block-fenced — code-block-generate / -render / code-highlight. - proof — ProofSugarTransform. - equation-labeled — EquationLabelTransform + crossref-resolve (eq). - toc-on — toc-generate, toc-render. - video-filter-header — built-in Lua filter under `resources/extensions/quarto/video/`. - theme-bootstrap — compile-theme-css stage. - table-bootstrap-class — TableBootstrapClassTransform. - lua-shortcode-version — Lua-loaded shortcode handler (returns `quarto.version`). In the queue: - **lua-shortcode-lipsum-fixed**: `SingleFile` passes; the pipeline itself is idempotent. `ProjectOrchestrator` panics with `MalformedSourceInfoPool` re-parsing the AST JSON the orchestrator emitted. This is a JSON writer/reader round-trip bug specific to lipsum-shortcode-generated inlines, not a transform-determinism finding. Filed as **bd-3odjm**. The test stays red per the plan's "do not #[ignore]" rule; the integration branch is allowed to carry the failure until the queue is drained. Verification: `cargo nextest run -p quarto-core --test idempotence` → 20 passed, 1 failed (bd-3odjm). Plan-1 unit tests and Phase-3 fixtures all green. Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-3odjm
Both pass on first run in both DriveMode variants. - include-in-header writes a tiny header.html and references it from front matter; exercises IncludeResolveStage. - resource-image writes a 67-byte minimal PNG and references it via inline image syntax; exercises ResourceCollectorTransform. Adds a write_bytes helper for the binary stub. Per the fixtures README rule the PNG sits at the project root and is referenced relatively (`./local.png`). Verification: `cargo nextest run -p quarto-core --test idempotence` → 22 passed, 1 failed (bd-3odjm).
Three orchestrator-only website fixtures. Two pass, one in queue. Pass: - website-chrome — navbar + sidebar + page-navigation + page-footer + favicon + bootstrap-icons + canonical-url + title-prefix. Two pages (index, other), tiny favicon stub. - website-listing — listing with categories enabled and feed: true, two posts under posts/, each with categories. Exercises listing-generate / -render, categories-sidebar, listing-feed-link, listing-feed-stage, listing-item-info. In the queue: - website-links — internal cross-page `.qmd` body links. Filed as bd-rz2we. Block 0 hash diverges across runs while meta hash is stable, so the divergence is genuinely in the AST blocks (not in rendered chrome). Hypothesis: link-rewrite or link-resolution is capturing the absolute project root (or canonicalized tempdir path) into the AST when it should emit a path-independent relative URL. Verification: `cargo nextest run -p quarto-core --test idempotence` → 24 passed, 2 failed (bd-3odjm, bd-rz2we). Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-rz2we
Extends Fixture with an optional attribution_json: Option<&'static str>. When present: - SingleFile installs PreBuiltAttributionProvider on RenderContext.attribution_provider before run_pipeline. - ProjectOrchestrator forwards the JSON via RenderToPreviewAstRenderer::with_attribution; the renderer installs the same provider type on the per-page RenderContext it constructs internally. Stub JSON has one actor + one run covering bytes 0..1024 (a wider range than the fixture body actually uses) so the attribution map overlaps the entire document and AttributionGenerateStage + AttributionRenderTransform have something to write into the AST. `cargo nextest run -p quarto-core --test idempotence` → 25 passed, 2 failed (bd-3odjm, bd-rz2we — both pre-existing). attribution_basic passes on first run in both DriveModes, so the deterministic provider + generate + render stack is genuinely idempotent. This completes the Phase 4 fixture set. The Plan-3 gate now covers: - 1 smoke fixture - 11 carry-forward (Phase 3, all green) - 9 Phase-4a doc fixtures (8 green, 1 in queue) - 2 Phase-4b multi-file (both green) - 3 Phase-4c website (2 green, 1 in queue) - 1 Phase-4d attribution (green) Total: 27 fixtures, 25 green, 2 in queue. Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-3odjm (Plan 5 will fix), bd-rz2we
Adds claude-notes/instructions/idempotence-contract.md — the author-facing summary of the contract Plan 3 enforces. Covers: - what the hash includes and excludes (source-info blind, insertion-order maps, merge_op participates, rendered.* excluded at top level only); - what new transforms must NOT do (undefined iteration order, process-local state, absolute paths, engine cells); - the fresh-Lua-state-per-run rule for Lua filters / shortcodes; - how to add a fixture (doc_fixture for trivial, inline closure for multi-file, ORCHESTRATOR_ONLY for chrome, attribution_json for attribution exercises); - the long-lived-integration-branch policy: don't #[ignore] a failing fixture without explicit user approval. Cross-linked from: - crates/quarto-core/tests/fixtures/idempotence/README.md (existing pointer expanded to point at the contract doc and the plan). - claude-notes/plans/2026-05-04-q2-preview-plan-7a-user-filter-idempotence.md (References section — authors looking at the runtime user-filter check find the CI contract too). Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
cargo nextest run --workspace: 9346/9348 pass. The 2 failures are the documented queue items (bd-3odjm, bd-rz2we); every other workspace test is green, including the 25 passing idempotence fixtures. cargo xtask verify (full WASM stack): Steps 1-4 green; Step 5 fails on the same 2 fixtures. That's the expected long-lived- integration-branch state per the plan's §Long-lived branch policy — the gate is allowed to be red until the queue is drained. Plan 3 is complete as a deliverable: gate + hashing infrastructure + 27 fixtures + author-facing docs + filed queue. Merge to main gated on draining the queue (bd-3odjm via Plan 5; bd-rz2we via a follow-up). Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
The Work-items section under Phase 1-7 was fully checked, but the parallel "Coverage gaps to address during implementation" inventory (per-fixture bullets, line ~560+) still showed unchecked boxes even though every fixture in that list now ships in idempotence.rs. Marked all 26 inventory items as landed. Annotated the two that are in the Phase-5 triage queue (lipsum-fixed → bd-3odjm, website-links → bd-rz2we) so the queue state is also visible from the inventory, not just from the Phase-5 work-items block. Plan checklist is now fully consistent: 54 checked, 0 unchecked.
…erContext
Plan 3's website_links fixture was non-idempotent: rendered AST link
URLs captured the absolute tempdir path of the per-run TempDir,
causing block-0 hash divergence across two runs with different
tempdirs. Root cause: `ResourceResolverContext::vfs_root_mode`
played two roles via a single PathBuf — disk-write root (where
runtime.file_write puts theme CSS / copied resources) and URL
prefix (what gets embedded in HTML link/asset URLs). In production
WASM these are intentionally identical; on native they have to
diverge so writes hit a real tempdir but URLs stay path-independent.
Split the field into `{ write_root, url_root }` and add a two-arg
`vfs_root_with_url_root` constructor plus per-renderer
`with_url_root` builder. Single-arg `vfs_root(...)` constructor
preserves the WASM identity contract by construction (write_root ==
url_root). Native test helpers in tests/idempotence.rs and
tests/render_page_in_project.rs now pass
`.with_url_root("/.quarto/project-artifacts")`, so rendered URLs
embed the synthetic prefix while disk writes still land in the
tempdir.
website_links now passes; 25/26 idempotence fixtures pass. The
remaining lipsum failure is bd-3odjm (FilterProvenance wire
format), owned by Plan 5 and out of scope here. Workspace nextest:
9347/9348. cargo xtask verify (Rust leg) clean for lint/fmt/build
with -D warnings.
Plan: claude-notes/plans/2026-05-21-vfs-url-write-root-split.md
Plan 4 (SourceInfo provenance types) finalized for development: - 7-phase work-items checklist (types → constructors → accessor updates → Lua serde → migration → tests → verification gate) - field renamed `anchors` → `from` (typed `SmallVec<[Anchor; 1]>` from day 1; serde feature required on smallvec) - accessor semantics for `Generated` pinned: length/start_offset/ end_offset → 0, map_offset → None, resolve_byte_range / remap_file_ids / extract_file_id delegate to invocation_anchor - required-Invocation-anchor invariant on `shortcode` kind documented with `By::shortcode` doc-comment requirement; enforcement split across Plan 6 audit test and Plan 7 debug_assert - Lua-table discriminant pinned to `t = "Generated"` - §Test plan and Phase 6 expanded to cover every accessor + mutator + the `combine()` × Generated corner - migration scope corrected (15 files, 27 occurrences); references and line ranges verified against the worktree source - §Open questions section removed (no open questions remain) Cross-plan `from` rename swept across Plans 3, 5, 6, 7, 8. Plan 5 JSON wire format (option D): - outer JSON key `anchors` → `from` (matches Rust field name) - inner anchor pool reference `from` → `si_id` (distinctive; avoids the `parent_id` tree-structure mental model that fits Substring's chain but not anchor references) - Reader/writer code samples updated; TS-side `SourceInfoEntry` shape note updated Plan 6 + Plan 7 hand-offs for the required-anchor invariant added. Deferred follow-ups (Dispatch anchor, ValueSource anchor) cross- referenced as bd-36fr9 and bd-129m3 (committed separately to main).
Plan 4 work happens on top of an integration branch carrying exactly one failing test (lua_shortcode_lipsum_fixed orchestrator mode, filed as bd-3odjm). That test's root cause is the wire-format code-3 collision Plan 5 owns, so Plan 4 must not try to fix it locally. Plan 4: - New §"Inherited pre-existing failure (bd-3odjm)" section between Out of scope and Work items. Explains the test, the panic shape, the root cause, and that any *other* failure in the idempotence suite is a Plan-4 regression. - Phase 7 verification gate updated: cargo nextest expects exactly one failure (bd-3odjm); cargo xtask verify trips on the same one. Plan 5: - New §"Inherited failure that must close on Plan 5's first reader change (bd-3odjm)" section. Spells out the contract: Plan 5's first reader change must turn lua_shortcode_lipsum_fixed green. If it doesn't, the Plan-5 author has an immediate signal that either the reader discrimination is wrong or the lipsum path produces a code-3 shape neither arm handles — stop and focus on it before moving on. - Test plan now cites bd-3odjm as the live first-iteration smoke check, ahead of the hand-constructed tests. Both plans now read consistently with the state of feature/provenance.
Plan 4 committed `from: SmallVec<[Anchor; 1]>` as the field type, but Plan 5's reader/writer + Plan 6's stamper code samples still used the `vec![]` macro to construct it. Those samples would not compile if taken literally — `vec!` produces a `Vec`, not a `SmallVec`. Switch to `smallvec![]` everywhere `Generated.from` is constructed: - Plan 5: 4 occurrences (legacy-Transformed code-3 reader; Anchor dedup test description; forward-compat test description; round- trip test description). - Plan 6: 14 occurrences across §"Per-transform fixes", §"Lua-shortcode enrichment", §"The post-walk helper", §"Variant semantics summary" etc. No semantic change — same constructions, just the macro that actually returns the field type.
Plan 4 + Plan 5: change Generated.from's inline capacity from SmallVec<[Anchor; 1]> to SmallVec<[Anchor; 2]> so the steady-state post-follow-up shape (Invocation + ValueSource on meta/var; Invocation + Dispatch on Lua-handler shortcodes) stays heap-free. Cost is +16 bytes per empty Generated; saves a heap allocation on every multi-anchor shortcode resolution. Also folds in research findings that were tacit in the previous draft: - Phase 1 smallvec line: replace "or verify present" hedge with the concrete two-file Cargo.toml edit (workspace + quarto-source-map), noting verified-absent. - skip_serializing_if path: use the fully-qualified serde_json::Value::is_null (the short form is a frequent gotcha). - By::raw policy: accept-all; forgery caught by Plan 6 audit + Plan 7 debug_assert, not by constructor rejection. - Anchor ordering: append order, stable across serde, at most one anchor per known role. - extract_file_id: empty-from Generated returns None, matching FilterProvenance's behavior; both call sites in to_ariadne_report already tolerate None. Stays a private fn on DiagnosticMessage. - Lua serde Concat recursion: legacy "FilterProvenance" inside a Concat piece is handled automatically; no .snap/.json fixtures contain the legacy tag. - Default risk: no struct holding SourceInfo derives Default in quarto-pandoc-types; Default for SourceInfo itself stays unchanged. - combine() × Generated: verified unreachable today (all 17 call sites combine Original/Substring shapes); the Phase 6 test documents intent for any future caller. - PartialEq: no production call site compares SourceInfo today; the derive is required by Block/Inline but not load-bearing.
The previous "+16 bytes per Generated" note understated the cost by ~2.5x. Actual delta: - Anchor = AnchorRole (32 bytes — String-bearing Other variant dominates) + Arc<SourceInfo> (8) = ~40 bytes. - SmallVec<[Anchor; 1]> ≈ 48 bytes; SmallVec<[Anchor; 2]> ≈ 88 bytes on the stack — a 40-byte delta per SmallVec field. - Since SourceInfo is an enum, its stack size is dictated by the largest variant, so every SourceInfo (Original/Substring/Concat too) grows by 40 bytes — not just Generated instances. Block/Inline carry SourceInfo by value, so the cost multiplies across the AST (tens-to-hundreds of KB on a large doc). Plan keeps cap=2 — the trade is still defensible — but documents the real cost honestly and notes Arc-boxing Generated as the next lever if memory-per-node ever bites the q2-preview editor.
…::unknown
Drop the Pandoc-flavored naming. q2 isn't pandoc-centric and the
affected call sites aren't all Pandoc (CLI stdin, Lua handoff,
external filter binaries).
Renames:
- json::read_strict + json::read_lenient -> json::read (strict) +
json::read_completing_source_info (the new lenient variant).
The function name matches the surrounding read_<thing> convention
in readers/json.rs (read_inline, read_block, read_attr_source,
make_source_info). Says exactly what it does.
- By::external_pandoc -> By::unknown. Honest about what we know
("we don't know"), generic enough to cover all four outside-world
call sites (qmd-syntax-helper, CLI stdin, external filter, Lua
handoff).
Pool-slot constants chained via + 1 in writers/json.rs so future
reserved slots don't require hardcoded number changes:
pub const USER_EDIT_SOURCE_INFO_ID: usize = 0;
pub const UNKNOWN_SOURCE_INFO_ID: usize = USER_EDIT_SOURCE_INFO_ID + 1;
SourceInfoSerializer::new() pre-pushes the slots in declaration order;
a unit test next to the constants asserts the pool entries match,
so adding or rearranging slots fails the test rather than silently
shifting IDs at consumer sites. The TS hand-mirror follows the same
pattern with a Rust-side CI parity test.
Provenance-contract.md §2 catalog: drop external_pandoc row, add
unknown row noting it's the source_info-completing reader's
placeholder.
Co-authored-by: Claude <noreply@anthropic.com>
Three coordinated changes to the design doc: 1. Define "authored content" upfront, before the BP formal statement. Replaces "node-local content" everywhere. The new term carries both the structural aspect (excludes descendants' bytes) and the semantic aspect (producer-contract attests user authorship). Pipeline- generated nodes have no authored content by definition; the dispatch routes them to non-emitting rules. (P2) now reads cleanly: "the byte was produced by serializing the authored content of a single AST node n." Reader doesn't have to infer the user-authorship scope from the dispatch table. 2. Add the Completeness section as a dual to Soundness. Four clauses partition every byte: (C1) Preserved - Source bytes still claimed by AST_new appear. (C2) Authored - non-soft-drop nodes' authored content appears. (R) Refused - soft-drop sites refuse authored content + warn. (D) Deleted - bytes no longer claimed don't appear. C-prefix denotes positive completeness (appears in Source'); R/D denote negative cases (doesn't appear). (C1)/(C2) dual (P1)/(P2). "Soft-drop site" defined precisely as "UseAfter or RecurseIntoContainer AND editability gate returns not-editable." R5-special (let-user-win) is explicitly NOT a soft-drop site; it falls under (C2). Proof by structural induction over R1, R1', R2, R2', R5, R3/R4 cases. 3. Rename "What BP does not promise" to "What BP and Completeness do not promise". Reclassify the marker-fidelity / lazy-numbering / block-container shell-regeneration gaps as a single unified completeness gap: helper-emitted bytes don't preserve user-specific syntactic choices. Soundness still holds (helper output is honest authored content via P2); completeness fails for byte-level fidelity of the original syntactic form. Producer-hygiene caveat updated to note both invariants depend on it. Plan 7d Phase 4 gains a companion property test: completeness_holds (parse(Source') structurally equivalent to AST_new for non-soft-drop inputs), alongside the existing bp_holds (no atomic-Generated bytes leak). The two properties pin both invariants empirically. Co-authored-by: Claude <noreply@anthropic.com>
Property tests verify every input satisfies the property, but say nothing about which dispatch rules the generator actually exercises. Without coverage assertions, a generator subtly biased toward easy cases (mostly R1, rarely R5-special) gives a false sense of confidence. Add thread-local DispatchCounters in plan_user_writes, gated behind a dispatch-coverage build feature (zero cost in production). Each dispatch row ticks per visit. Property tests assert per-row minimum coverage after proptest completes; under-exercised rows fail with a specific message naming the row. Tuned thresholds: R1 >= 100 (most common; preserved content) R1' (soft-drop) >= 50 (atomic-Generated edit refusal) R2 / R2' >= 20 (omit / soft-omit) R3-helper >= 50 (new container with helper shells) R3-transparent >= 50 (sectionize wrapper recursion) R4 >= 30 (inline container preserved shells) R5 >= 50 (leaf serialization) R5-special >= 20 (let-user-win atomic CustomNode replace) Keeps the generator honest as the writer evolves: future contributors adding a dispatch row must add a corresponding threshold; a future change that accidentally makes a row unreachable surfaces as a coverage failure rather than passing tests. Co-authored-by: Claude <noreply@anthropic.com>
Add a framing sub-section at the top of Phase 4 that ties the four testing pieces together as a coordinated strategy: - Generator (gen_pandoc_with_atomic_descendants) — produces ASTs with atomic-Generated descendants at varying depths plus user-edits. Extends the existing quarto-ast-reconcile generators with two new capabilities: atomic-injection at configurable density, and realistic user-edit transformations. - Marker-string convention for soundness (bp_holds) — fresh recognizable marker per iteration injected into atomic-Generated content; one-line assertion that it doesn't appear in Source'. - Structural-equivalence reuse for completeness (completeness_holds) — reuses quarto_ast_reconcile::hash::compute_block_hash (already source-info-blind per hash.rs:498), which absorbs helper- canonicalization gaps at the AST level without bespoke matchers. - Required dispatch-coverage instrumentation — the full spec stays in the work items below; the intro names it as the fourth coordinated piece. Closes the loose thread from the conversation: I had offered to write this sub-section but only landed the coverage-counter piece. The four pieces fit together; the intro makes the fit explicit so a future implementer reading Phase 4 understands the strategy before the work items. Co-authored-by: Claude <noreply@anthropic.com>
Item 1 (Phase 4 — pool intern dedup): The serializer's intern cache is
strict Arc-pointer equality at parent edges only; it never dedups
top-level intern calls by value. Round-tripped completing-reader nodes
will get fresh pool entries structurally equal to the reserved slots.
Decision: accept the duplication (option a). Bounded, per-document,
cosmetic. Add a one-line comment near intern marking it intentional.
Item 2 (Phase 4 — per-caller reader-split verification): All five
outside-world callers consume source_info downstream, so the placeholder
choice matters. json_filter.rs gets By::filter(filter_path, 0); the
other four get By::unknown(). Signature change: read_completing_source_info
should accept default_by: By rather than baking unknown in, so callers
declare their provenance up front. Flag: qmd-syntax-helper's qmd::write
calls shift dispatch from R1-empty to R5-synthesize — the new behavior
is correct.
Item 3 (Phase 6.5 — reconciler "synthesis sites"): The line numbers in
the earlier draft pointed to test code (AttrSourceInfo::empty field
assignments in #[cfg(test)] blocks), not InlineAttr::new calls. The
three real production InlineAttr::new sites live in pampa's tree-sitter
lowering and pass non-empty attr_source; they need explicit source_info
wired through from the surrounding parse range. By::reconcile_synthesize
becomes a forward-looking primitive; no producer uses it at 7f-landing.
Item 4 (Phase 1 — renderCustomNodeChildren): Verified preserves s: via
{ ...customNode, slots: ... } spread at dispatch.tsx:274. Both CustomBlock
and CustomInline reach the same path. Move both from "needs verification"
to "preserves."
Open questions for review:
- By::filter atomic-kind concern for external filter output (item 2 table).
- Whether read_completing_source_info reuses UNKNOWN_SOURCE_INFO_ID when
default_by == By::unknown() or always allocates fresh (recommend fresh
for uniform path).
…_synthesize, expand 6.5 scope
Decisions locked in (2026-05-30 conversation):
- Keep USER_EDIT_SOURCE_INFO_ID = 0 magic number (framework can't allocate
into the Rust pool; the slot ID must be agreed in advance).
- Drop UNKNOWN_SOURCE_INFO_ID and the second reserved slot. The completing
reader takes `default_by: By` and allocates a fresh pool entry on every
fill. No hand-mirror, no parity test for slot 1, no special case for
`default_by == By::unknown()`.
- Drop By::reconcile_synthesize entirely — no producer uses it at 7f-landing.
- Add By::is_programmatic_sentinel() predicate covering config-default,
programmatic-config, unknown. Replaces the navigation_href.rs equality
check against SourceInfo::default(). No is_default() function needed.
- By::unknown is non-atomic. By::filter is atomic and the right semantic
for json_filter.rs (filter-added nodes shouldn't be source-editable).
Phase 3 walker fix:
The previous walker used a 't' in value heuristic to recurse into
CustomNode slots, which would have misread the Slot wrapper
({ kind, value }) as a non-AST object and silently failed to stamp
anything inside slots. Rewritten to dispatch on slot.kind per the
actual TS Slot discriminated union at
ts-packages/preview-renderer/src/framework/types.ts:123-128.
Phase 6.5 expansion:
Audit found additional production SourceInfo::default sites the plan
missed:
- config_value.rs:822, 826 (insert_path intermediates) → By::programmatic_config
- project_resources.rs:541 (canonicalize_within_project sentinel) → By::unknown
- navigation_href.rs:382 (equality check) → is_programmatic_sentinel pattern
SchemaError::InvalidStructure scope corrected: 4 None sentinel sites
(merge.rs:32/51/88, mod.rs:250), ~11 Some(value.source_info.clone())
sites in helpers.rs, plus a formatter at error.rs:33-46. Plan
previously claimed "four call sites" — undercounted by 3×.
Mechanical fixes:
- InlineAttr::new line numbers 304-311 → 333-348 (the actual location).
- JsonReadError line numbers 23/30 → 25/31.
- writers/json.rs s:-bearing struct range 1010-1116 → 1068-1195.
- Phase 7 deprecated Default impl: file: FileId(0) → file_id: FileId(0).
- Phase 5: clarify the "remove the camelCase fallback" wording (no
real fallback exists; the per-field rename overrides the macro).
- ATOMIC_CUSTOM_NODES Rust + TS paths spelled out for the parity test.
- attr.rs:45-46 stale doc-comment (claims SourceInfo::default fallback;
real consumers fall back to None) noted for cleanup.
…lan 7d trust-point gate Decisions locked in (2026-06-01): - PandocNativeIntermediate::IntermediateAttr widens to carry SourceInfo alongside (Attr, AttrSourceInfo). Cleaner provenance than chasing source_info through three uneven call paths; one producer-side update versus three consumer-side refactors. - q2-debug uses the framework's <Node>, so Phase 3 stampUserEdits comes for free. Only one q2-debug-local renderer (Figure at components.tsx:110) needs the Phase 2 spread-fix. - Plan 7d's R5 trust point is enforced by `-D deprecated`. After Phase 7 lands the deprecation, denying it in CI turns every remaining SourceInfo::default() caller into a compile error. The compiler is the audit; no separate residue grep step needed. Audit results (four background agents, 2026-06-01): 1. Cross-crate residue: green. The 447 quarto-core SourceInfo::default hits dramatically overstate exposure. Actual production residue beyond Phase 6.5's list: citeproc/output.rs:1274, quarto-config/materialize.rs:132/152/165, quarto-core/project/listing/feed/stage.rs:596/602. All added to Phase 6.5 work items. 2. derive(Default) on SourceInfo-bearing structs: false alarm. None of the five candidate structs actually contain a SourceInfo field. The deprecation won't fire on them. Phase 8 audit step downgraded to a no-op note. 3. ConfigValue::default semantics shift: safe. Only 2 production callers (include_expansion.rs:203,238); both construct a transient Pandoc wrapper and discard the .meta field without reading it. Migration sound. 4. Snapshot churn: 62 .snap files in crates/pampa/snapshots/json/ (one directory). Other 167 snapshots unaffected. Phase 6's dispatch shift expected to produce zero snapshot diffs (the harness uses real-parsed AST, not defaults). Commit-split recommended: Phase 5 renames first, then Phase 4 pool-shift, then Phase 6 (expect no snap diffs). Plan now reads end-to-end with bounded scope and a compile-time enforcement mechanism. Ready for implementation.
… for rebase Prepares feature/provenance for rebase onto origin/main, which landed the integration-test consolidation (#239 / bd-xvdop): every crate now has a single `tests/integration/<name>.rs` + `tests/integration/main.rs` binary instead of one binary per `tests/<name>.rs`. `idempotence.rs` is the only test file on this branch that is NEW (no counterpart on main), so a straight rebase would land it in the deprecated old layout with zero conflict and zero signal — silently reintroducing the per-file-binary bloat #239 removed, caught by no lint or compile error. Move it into the new layout now, as an explicit, reviewable, build-verified commit, so the placement is a verified fact before the 85-commit replay rather than a post-rebase chore: - git mv tests/idempotence.rs -> tests/integration/idempotence.rs - add tests/integration/main.rs registering `pub mod idempotence;` On rebase, the new tests/integration/main.rs will collide (add/add) with main's version (~34 modules); resolution is a trivial union (keep main's list + idempotence). That loud conflict is the point — it can't be missed. Verified on this branch (pre-rebase): integration binary compiles; all 27 idempotence tests pass under `binary(integration)`. The genuinely-renamed test files (incremental_writer_tests.rs et al.) are left for rebase rename-detection to follow + a post-rebase structural check.
Earlier note implied 7b might use hand-crafted JSON that would need the strict-reader pattern. After reading 7b in full: it's qmd-focused test coverage that constructs ASTs directly in Rust and exercises the qmd writer (`incremental_write`, `compute_blocks_hash_fresh`). No JSON reads or wire-format assertions. 7b ships after 7f. The interaction is API-surface-only — 7b's authors write against the post-7f APIs from the start (for_test, 3-arg InlineAttr::new, widened IntermediateAttr). No rebase work needed.
…_INFO_ID Plan 7f's 2026-05-30 research findings dropped two earlier-draft items: - `By::reconcile_synthesize()` — no producer uses it at 7f-landing time; remove from the By:: catalog. Add back later if a reconciler path appears that synthesizes new AST without an input SourceInfo to inherit from. - `UNKNOWN_SOURCE_INFO_ID` reserved pool slot — the completing reader takes a `default_by: By` parameter and allocates a fresh pool entry per missing `s:`, so there's no slot 1. Rewrite the `By::unknown()` row to describe the actual mechanism. Brings provenance-contract.md back in sync with the plan; pre-Phase-1 cleanup so the catalog matches what ships.
Wrap-rebuild renderers in `dispatch.tsx` and the q2-debug `Figure`
renderer were emitting a fresh `{ t: '<Tag>', c: newChildren }` object
on every child edit, dropping `s:` (and every other top-level field)
from the rebuilt parent. After Phase 2:
- 19 stripping renderers (Emph/Strong, the five flat inline wrappers
via `makeFlatInlineRenderer`, Link/Image/Span/Quoted,
Para/Plain/Header/BlockQuote/Div, BulletList/OrderedList/Figure) now
rebuild via `{ ...node, c: ... }`.
- q2-debug's local Figure renderer at
`hub-client/src/components/render/q2-debug/components.tsx:110`
gets the same spread treatment.
- `dispatch.test.tsx` covers all 22 entries in the
`renderChildrenRegistry`: 19 that previously failed and the 3 that
already preserved (`Ast`, `CustomBlock`, `CustomInline`).
Preserving `s:` is a precondition for the strict JSON reader landing
in Plan 7f Phase 4. Without it, every child edit rebuilds an ancestor
with no source_info reference, which the strict reader would reject.
…igure s: preservation)
…7f Phase 3)
Wrap `<Node>`'s `setLocalAst` so every AST a user-edit affordance hands
up the chain has `s:` populated on every node. The walker:
- Stamps `s: USER_EDIT_SOURCE_INFO_ID` (slot 0) on any node lacking `s:`.
- Leaves preserved nodes (those with existing `s:`) untouched, so the
Phase 2 rebuilt-wrapper path keeps the original parent's source_info.
- Recurses into `c:` (standard wrapper shape) and `slots:` (CustomNode
shape, dispatched on `slot.kind`).
- Walks nested arrays inside `c:` so Header / Link / BulletList shapes
stamp their inner inline arrays correctly. Tagged-marker values
(`{t: 'DisplayMath'}`, `{t: 'SingleQuote'}`) get a spurious `s:`
field; serde-tag-based reads ignore it (markers are deserialized
by tag, not by struct), so this is harmless.
The atomic-gate noop path skips stamping — wasted work when the edit
is dropped anyway. Stamping is per-node idempotent; outer-level
rewalking of a stamped subtree is a no-op.
`USER_EDIT_SOURCE_INFO_ID = 0` lands in
`ts-packages/preview-renderer/src/types/sourceInfo.ts` here, ahead of
Plan 7f Phase 4's Rust counterpart + hand-mirror parity test.
Three plan-mandated tests + four robustness tests in
`stampUserEdits.test.ts`: fresh Span stamping, rebuilt-wrapper
preservation, splice-in (new + preserved siblings), CustomBlock slot
recursion, `block`/`inline` single-value slot recursion, nested-array
walks (Header c[2], BulletList items), idempotence.
…an 7f Phase 4) `By::unknown()` is the placeholder kind for nodes deserialized through `json::read_completing_source_info` when the upstream producer doesn't populate `s:` — qmd-syntax-helper's Pandoc subprocess output, CLI `--from json`, Lua AST handoff. Non-atomic by design: nodes carrying this kind remain editable in the preview; user edits re-stamp them as `user_edit` on save. Extends `test_by_is_atomic_kind` to assert non-atomicity, and adds a `test_by_unknown_constructor` that pins `kind == "unknown"` + null `data`. Phase 6.5's `is_programmatic_sentinel()` predicate will recognize this kind alongside `config-default` and `programmatic-config`.
…se 4)
Splits the JSON reader's leniency into two named entry points:
- `json::read` becomes strict — nodes missing their `s:` reference fail
with `JsonReadError::MissingSourceInfoRef { node_path }`. The node_path
is best-effort (tag name + parent context); good enough for a debugger
to find the responsible producer site without the plumbing cost of a
precise JSON-pointer.
- `json::read_completing_source_info(reader, default_by: By)` fills
missing `s:` with `Generated { by: default_by, from: [] }` in-place per
node (no pool entries allocated on read — the writer creates the pool
ID on re-serialize). Used by every site that consumes JSON from
outside q2's source-tracking world.
Five outside-world callers switched per the plan's per-caller table:
- `json_filter.rs` → `By::filter(filter_path, 0)` (atomic-kind for
filter-added nodes; pass-through nodes keep their original `s:`).
- `qmd-syntax-helper/{definition_lists,grid_tables}.rs` →
`By::unknown()`. Writer dispatch for these nodes shifts from
R1-empty to R5-synthesize, which is the correct round-trip behavior.
- `pampa/src/main.rs` (CLI `--from json`) → `By::unknown()`.
- `pampa/src/lua/readwrite.rs` (Lua `pandoc.read(_, "json")`) →
`By::unknown()`.
The strict reader catches two real writer bugs that previously
round-tripped silently through `SourceInfo::default()`:
1. `write_custom_block` and `stream_write_custom_block` synthesized
`Plain`/`Div` wrappers for slot encoding without `s:`. Same shape
in `write_custom_inline` / `stream_write_custom_inline` for the
`Span` wrapper and the `[block content]` placeholder Str. All now
inherit the parent CustomNode's `s_id`.
2. `Figure` did not emit `captionS` (Table did). Strict reader
rejected Figure captions; added `captionS` to both the buffered
and streaming Figure writers, and updated the Figure reader to
consume it. Same shape as Table's `captionS`.
Tests:
- `json_reader_smoke_tests.rs` reads Pandoc-format fixtures under
`tests/readers/json/` — switched to `read_completing_source_info`.
- `test_json_div_transforms.rs` mimics `--from json` with hand-crafted
pampa JSON — switched to match `main.rs`.
- Full pampa suite (3903 tests) + workspace suite (9727 tests) green.
Required adding `quarto-source-map` as a direct dep of `qmd-syntax-helper`
(previously transitive through `pampa`).
…ool`→`p` (Plan 7f Phase 5)
Phase 5 of Plan 7f compacts two top-level JSON keys to match the
rest of the wire format's single-character convention.
Writer (`crates/pampa/src/writers/json.rs`):
* `#[serde(rename = "a")]` on `NodeWithAttrJson::attr_s`; field-order
invariant preserved (a, c, s, t still alphabetic).
* `#[serde(rename = "p")]` on `AstContextJson::source_info_pool`;
alphabetic order under `astContext` preserved (files,
metaTopLevelKeySources, p).
* All 24 literal `"attrS"` keys and the 1 `"sourceInfoPool"` key
in object-construction sites updated; doc comments + the Figure
inline order-comment rewritten for the new alphabet (a, c,
captionS, s, t).
Reader (`crates/pampa/src/readers/json.rs`): symmetric reads of the
new keys (14 sites + 1 pool key); error variant messages and the
deserializer doc-block now reference `p` and `a` while keeping
the human-readable name "source-info pool".
TS:
* `ts-packages/pandoc-types/src/types.ts` — 11 `attrS:` interface
fields → `a:`; `RustQmdJson.astContext.sourceInfoPool` → `p`.
* `ts-packages/preview-renderer/src/types/sourceInfo.ts` &
`framework/Ast.tsx` — `AstContext.p` is the wire-format key; the
internal React-context field stays `sourceInfoPool` for readability.
* `ts-packages/annotated-qmd/src/{index.ts,block-converter.ts,
inline-converter.ts}` — wire-format accesses
(`block.attrS`/`inline.attrS`/`headS.attrS`/etc. → `.a`;
`json.astContext.sourceInfoPool` → `.p`); internal parameter
`attrS` renamed to `attrSource` for clarity. Tests, README,
`debug-figure.js`, and `check_mismatches.py` follow.
Audit (2026-06-01) confirmed `hub-client/`, `q2-preview-spa/`, and
`crates/hub/` don't pattern-match on these keys — they delegate to
the TS type packages.
Snapshot regeneration:
* 62 `.snap` files in `crates/pampa/snapshots/json/` regenerated
via `INSTA_UPDATE=always cargo nextest run -p pampa`. Diff is
pure key rename (`"attrS":`→`"a":`, `"sourceInfoPool":`→`"p":`)
plus a refreshed snapshot-source header reflecting the post-
bd-xvdop integration-tests layout (`tests/test.rs` →
`tests/integration/test.rs`). Commit sequencing: Phase 5
renames land first; Phase 4's pool-slot-0 commit follows and
regenerates the same 62 files for the +1 ID shift.
Example-fixture regeneration:
* 20 `ts-packages/annotated-qmd/examples/*.json` + the
`test/fixtures/math-with-attr.json` rebuilt by running
`cargo run --bin pampa -- -t json -i <each>.qmd`. Committed
fixtures dated to 2025-10-24 (commit 2b2337b) and were stale
against multiple unrelated pampa releases; regeneration is
required for the TS code (which now reads `a`/`p`) to find any
data at all.
Docs:
* `claude-notes/designs/provenance-contract.md` — wire-format key
references updated to `astContext.p`.
* `claude-notes/instructions/performance-profiling.md` — Python
canonicalize snippet uses `astContext["p"]`.
* Historical plans/research notes intentionally retain `attrS` /
`sourceInfoPool` since they describe state-as-of-then.
Verification:
* `cargo nextest run --workspace` → 9727 pass.
* `cargo xtask verify` (full hub-build leg) → all 12 steps green
including WASM rebuild + q2-preview-spa bundle.
* `hub-client` unit tests → 82/82 pass.
* `preview-renderer` tests → 205/205 pass.
Known side-issue (not blocking): `annotated-qmd` shows 2/156 test
failures — pre-existing pampa source-tracking off-by-one (inline
code + div key-source spans capture a preceding whitespace byte).
Filed as `bd-1d6io` with suspected-cause investigation pointing
at commit `38e889ad` (2026-05-24, multi-line inline-code-span
tokenization rework). Phase 5 only renamed JSON keys; no offset
computation was touched.
Plan: claude-notes/plans/2026-05-29-q2-preview-plan-7f-prereqs.md
…se 4 pool-slot)
The React framework's `stampUserEdits` walker (Plan 7f Phase 3) stamps
`s: USER_EDIT_SOURCE_INFO_ID` on every AST node a `setLocalAst` call
introduces without an existing `s:`. Until now the constant existed only
on the TS side (added in commit `7ac9f445`); the Rust writer never
pre-populated slot 0, so the stamp resolved to whatever happened to be
interned first in each document. Most stamps landed on benign
`Original{0..0}` entries, but the semantic was wrong — `s:0` should
*mean* "this came from a user edit", not "this was the first thing
interned." This commit makes the round-trip honest.
Writer (`crates/pampa/src/writers/json.rs`):
* `pub const USER_EDIT_SOURCE_INFO_ID: usize = 0;` defined alongside
`SourceInfoSerializer`. Docstring chains future reserved slots via
`+ 1` and points at the TS hand-mirror.
* `SourceInfoSerializer::new()` now pre-pushes a
`Generated{by: By::user_edit(), from: vec![]}` entry at index 0.
The slot exists in every JSON document the writer produces
regardless of whether any node references it.
* The 9 writer-side unit tests that asserted `pool.len() == N` after N
interns now express N as `USER_EDIT_SOURCE_INFO_ID + 1 + N` (using
`let first_user_id = USER_EDIT_SOURCE_INFO_ID + 1;` locally) so a
future second reserved slot doesn't silently break call sites.
* New `test_reserved_slot_user_edit` pins the layout: a fresh
serializer has `pool[USER_EDIT_SOURCE_INFO_ID]` carrying
`Generated{by: user_edit, from: [], r: [0,0]}`. Rearranging reserved
slots fails this test rather than silently shifting IDs.
* New `test_user_edit_slot_id_matches_typescript_mirror` reads
`ts-packages/preview-renderer/src/types/sourceInfo.ts` via
`CARGO_MANIFEST_DIR`-relative path, parses the
`export const USER_EDIT_SOURCE_INFO_ID = N;` literal, and asserts
`N == 0`. Catches rename, restructure, or value drift on either side.
Reader-side and TS-side tests that construct their own pool literals
were left as-is — they're not asserting against the writer's
reserved-slot contract.
Snapshot regeneration:
* 62 `.snap` files in `crates/pampa/snapshots/json/` regenerated.
Diff is exactly the plan's predicted shape: every `"s":N` reference
shifts to `"s":N+1`, every `Concat` piece `source_info_id` shifts
by +1, and each pool gains a new entry at index 0:
`{"d":{"by":{"kind":"user-edit"}},"r":[0,0],"t":4}`.
Example-fixture regeneration:
* 20 `ts-packages/annotated-qmd/examples/*.json` +
`test/fixtures/math-with-attr.json` rebuilt by running
`cargo run --bin pampa -- -t json -i <each>.qmd`. Same +1 shift on
every `s:` reference plus the new pool[0] entry. Required because
the TS test suite reads these fixtures and indexes into the pool
by the `s:` field.
Verification:
* `cargo nextest run --workspace` → 9731 pass (+4 vs Phase 5: the new
reserved-slot and TS-parity tests, each running once as a unit test
and once via the integration binary).
* `cargo xtask verify` (full hub-build leg) → all 12 steps green
including WASM rebuild + q2-preview-spa bundle.
* annotated-qmd: 2/156 known failures remain (bd-1d6io,
source-tracking off-by-one) — unchanged from Phase 5; not caused
by this pool shift.
Plan: claude-notes/plans/2026-05-29-q2-preview-plan-7f-prereqs.md
…tructors
Foundation for Plan 7f Phase 6 (test audit) and Phase 6.5 (production
residue sweep). Adds, in `crates/quarto-source-map/src/source_info.rs`:
- `By::test_scaffold()` — non-atomic, `kind: "test-scaffold"`. Paired
with `SourceInfo::for_test()` for tests that need a `SourceInfo`
field but have no real provenance to record.
- `SourceInfo::for_test()` — convenience that returns
`Generated{by: test_scaffold(), from: []}`. Replaces
`SourceInfo::default()` in test code; intentionally produces
*different* writer dispatch (R5/R3 synthesize vs R1-empty-range
copy) because the new behavior is the correct one for AST without
real source bytes.
- `By::config_default()` / `By::programmatic_config()` — non-atomic
sentinel kinds for `ConfigValue` residue sites (Phase 6.5
`config_value.rs` fixes lean on these).
- `By::is_programmatic_sentinel()` — predicate matching
`config-default | programmatic-config | unknown`. Replaces the
pre-7f `source == &SourceInfo::default()` comparison in
`navigation_href.rs`.
Six new unit tests cover: constructor shape (kind/data) for each new
`By::*`, non-atomicity for all four new kinds, `is_programmatic_sentinel`
positive/negative cases, and `SourceInfo::for_test` shape. The existing
`test_by_is_atomic_kind` was extended with three new negative
assertions so a future change can't silently promote `test-scaffold`,
`config-default`, or `programmatic-config` to atomic without breaking
the test.
No production callers yet — those land in subsequent commits per the
Phase 6 / 6.5 work-item split in CURRENT.md.
…olding
Plan 7f Phase 6 — first batch of the test audit. All sites in this
commit are structural test scaffolding (constructors that require a
SourceInfo field; no real source bytes exist for the hand-crafted
fixture).
Sites touched (test code only):
- crates/quarto-xml/src/types.rs — 11 sites in `mod tests`
(XmlAttribute / XmlElement constructor scaffolding).
- crates/quarto-yaml-validation/src/tests.rs — 3 sites in
`make_yaml_*` helpers.
- crates/quarto-yaml-validation/src/validator.rs — 14 sites inside
the file's `#[cfg(test)] mod tests` (yaml_scalar / yaml_array /
yaml_object / test_navigate_nested fixtures).
- crates/quarto-yaml-validation/src/schema/parsers/combinators.rs:66
— local `source_info()` test helper.
- crates/quarto-yaml-validation/src/schema/helpers.rs:172 — same
pattern, local `source_info()` test helper.
- crates/quarto-ast-reconcile/src/generators.rs:631 — proptest
generator for Shortcode.
- crates/quarto-core/tests/integration/{jupyter_integration,
navigation_e2e, navigation_merge, engine_merge, attribution_*}.rs
— 35 sites across the 8 quarto-core integration tests that build
hand-crafted Pandoc AST + ConfigValue fixtures.
Behavior implications (per CURRENT.md's writer dispatch note):
- `SourceInfo::default()` is `Original{FileId(0),0,0}` →
`preimage_in(FileId(0))` returns `Some(0..0)` (empty range) → R1
copies zero bytes. `for_test()` is
`Generated{by:test_scaffold, from:[]}` → `preimage_in` returns
`None` → R5/R3 synthesize (or pass-through wrapper). The new
behavior is the correct one for AST with no real source bytes,
and no test in this batch asserts on writer byte output.
- `navigation_href.rs:382` still uses `source == &SourceInfo::default()`
(Phase 6.5 will swap this to `is_programmatic_sentinel()`). For
the navigation_e2e / _merge / attribution tests in this commit,
the swap is benign: `for_test()` no longer equals `default()`,
but `resolve_byte_range()` returns `None` for the empty-from
`Generated`, so navigation_href takes the "Concat/Filter"
fall-through path and returns `raw` unchanged — same outcome as
the old explicit short-circuit.
Schema/merge.rs:32,51,88 and schema/mod.rs:256 (the 4 production
SchemaError::InvalidStructure sites) intentionally not touched —
they belong to Phase 6.5's `location: Option<SourceInfo>` refactor.
Test results: per-crate `cargo nextest run` clean across all four
crates (24/24 quarto-xml, 265/265 quarto-yaml-validation, 218/218
quarto-ast-reconcile, 2199/2199 quarto-core).
Plan 7f Phase 6 — pampa batch. All swapped sites are test
scaffolding: pampa/tests/* (the 18 integration test files —
156 sites) and the `#[cfg(test)] mod tests` blocks inside
pampa/src/* (85 sites). Plus crates/pampa/src/lua/filter_tests.rs
(included via `#[cfg(test)] #[path = "filter_tests.rs"] mod` —
the whole file is test code, 156 more sites).
Test results: `cargo nextest run -p pampa` clean (3907/3907 pass,
2 skipped). No assertion-on-byte-output tests in this batch
regressed under the R1-empty-range → R5/R3-synthesize dispatch
shift that follows from for_test()'s non-Original shape.
Production-residue audit (deferred): per `git grep
'SourceInfo::default()' crates/pampa/src/`, 42 sites remain in
pampa src that are NOT inside `#[cfg(test)]`. Per-file breakdown:
- `readers/json.rs` — 7 sites, all marked "Legitimate default:
backward compat" for legacy Pandoc JSON without source info.
Explicitly allowed by `provenance-contract.md` §10. Will need
`#[allow(deprecated)]` annotations under Phase 7's
`#![deny(deprecated)]`.
- `lua/types.rs` (8), `lua/utils.rs` (10), `lua/readwrite.rs` (2)
— Lua-side fallbacks where `filter_source_info` is expected to
overwrite `SourceInfo::default()` with `Generated{by:filter,…}`
before the AST is consumed. Producer contract acknowledges
this pattern at the call-stack level.
- `citeproc_filter.rs` (3), `pandoc/meta.rs` (3),
`writers/json.rs` (2), `toc.rs` (2),
`template/config_merge.rs` (5) — genuine production residue
the Phase 6.5 plan did NOT enumerate. Most need a new
`By::citeproc()`/`By::yaml_error_recovery()`/`By::toc_synth()`
kind or routing through `By::programmatic_config()` /
`By::unknown()`. Surfacing as a per-site decision before Phase
7 deprecation lands.
This commit ships the 312 test-only swaps (test_scaffold writer
dispatch is benign for tests that don't assert on byte output).
Production sites tracked separately for Plan 7f Phase 6.5
extension.
Plan 7f Phase 6 — final test-audit batch. Covers all remaining crates with `SourceInfo::default()` test-scaffolding sites: 57 PURE_TEST files (where no production residue exists) bulk-swapped end-to-end, plus 28 MIXED files where the swap was scoped to the `#[cfg(test)] mod tests` region. Plus one `tests/integration/*.rs` file (`quarto-sass/.../brand_config_test.rs`) that's all test code by virtue of living under `tests/`. Affected crates: quarto-core (all transforms, stages, engine helpers, project plumbing), quarto-navigation (all subviews), quarto-pandoc-types/config_value.rs (95 test sites + 1 unused sentinel-equality test pinned to default() — see below), quarto-pandoc-types/inline.rs, quarto-config (all submodules), quarto-sass, quarto-doctemplate, quarto-yaml, quarto-publish, plus the integration brand_config_test. Two assertion-pin fixes after sed swept too eagerly: - `quarto-core/src/stage/stages/engine_execution.rs:1378` — `test_execution_context_has_source_info` asserts against the production `ExecutionContext::new` default. RHS reverted to `SourceInfo::default()` with a comment; Phase 7's deprecation will surface engine/context.rs:92 as a residue site and the assertion gets updated alongside. - `quarto-pandoc-types/src/inline.rs:1459` — `source_info_attr_empty` pins the `InlineAttr::new` fallback. RHS reverted to `default()`; this test is on Phase 6.5's deletion list (the InlineAttr::new signature refactor removes the fallback entirely). Production residue remains (not part of this commit, surfaced for Phase 6.5 + Phase 7): - Planned Phase 6.5 sites (enumerated in CURRENT.md): config_value.rs (5), project_resources.rs (2), navigation_href.rs (1+2 follow-up), citeproc/output.rs (1), config/materialize.rs (3), listing/feed/ stage.rs (2), yaml-validation/schema/merge.rs+mod.rs (4), pandoc-types/inline.rs (InlineAttr refactor + IntermediateAttr widening, ~10 sites). - Discovered residue not in plan: ~70 additional production sites across pampa (citeproc_filter, toc, pandoc/meta, writers/json, template/config_merge, lua/types, lua/utils, lua/readwrite), quarto-analysis, quarto-core engine/jupyter, quarto-core transforms (callout_resolve, categories_sidebar, shortcode_resolve, sidebar_auto, theorem, …), quarto-navigation. These will be surfaced by Phase 7's `#![deny(deprecated)]` once the deprecation attribute lands; fixes can be applied per-site or temporarily allow-listed at that time. - Legitimate `SourceInfo::default()` calls retained per the producer contract: 7 in `pampa/src/readers/json.rs` (Pandoc legacy-JSON backward compat, explicitly allowed by `provenance-contract.md` §10), 1 in `quarto-source-map/src/source_info.rs` (the actual `impl Default for SourceInfo` body — Phase 7 deprecates this). Workspace tests: 9736/9736 pass, 196 skipped.
…config_default / programmatic_config
Plan 7f Phase 6.5 — first production-residue commit. Replaces four
of the five `SourceInfo::default()` sites in
`crates/quarto-pandoc-types/src/config_value.rs` with explicit
`Generated{by:…}` provenance:
- `impl Default for ConfigValue` (line 415) →
`Generated{by: By::config_default()}`. The empty-Map sentinel
used by every `ConfigValue::default()` caller.
- `ConfigValue::from_path` (line 539) →
`Generated{by: By::programmatic_config()}`. WASM-bridge
programmatic injection.
- `ConfigValue::insert_path` intermediate map + key_source (lines
822, 826) → same `programmatic_config` provenance.
- Doc-comment example for `insert_path` updated to show the new
shape.
(Fifth `default()` site was on the assertion side of the now-fixed
`source_info_attr_empty` test — that test still asserts against
the production fallback in `InlineAttr::new`, which Phase 6.5's
InlineAttr refactor removes.)
Reader-side compatibility: `crates/pampa/src/readers/json.rs:2212`
(top-level meta) updated to match. The JSON wire format does not
carry a per-meta `s:` field (Pandoc-compatible), so the reader
stamps the meta with the same `config_default` kind the writer's
`ConfigValue::default()` now produces. Without this, every JSON
round-trip would observably drop the meta's source_info; the
`test_json_roundtrip_simple_paragraph` test caught it. The five
other "Legitimate default" sites in the same function
(2191/2195/2199/2315/2339 — backward-compat for legacy
Pandoc-only JSON without `key_sources`) are deliberately left as
`SourceInfo::default()` for now; Phase 7's deprecation will surface
them as `#[allow(deprecated)]` candidates.
Workspace tests: 9736/9736 pass, 196 skipped.
Plan 7f Phase 6.5 — second production-residue commit. Replaces the
remaining enumerated sites in `quarto-core`:
- `crates/quarto-core/src/project_resources.rs:123` —
`Pattern::without_source` was using `SourceInfo::default()` as a
scaffolding sentinel. Now `Generated{by: By::unknown()}`.
- `crates/quarto-core/src/project_resources.rs:541` — Engine /
Lua-filter resource entries don't carry a YAML source location;
the call to `canonicalize_within_project` still requires a
`SourceInfo` per the current signature. Replaced
`&SourceInfo::default()` with `&SourceInfo::generated(By::unknown())`.
Follow-up beads issue **bd-3az78** filed to refactor
`canonicalize_within_project` to take `Option<&SourceInfo>`.
- `crates/quarto-core/src/transforms/navigation_href.rs:382` — the
programmatic-sentinel detector. Pre-Phase-6.5 code compared
`source == &SourceInfo::default()`; that equality survives only
as long as `Original{FileId(0),0,0}` is the canonical sentinel.
Replaced with the producer-side predicate:
`let SourceInfo::Generated { by, .. } = source && by.is_programmatic_sentinel()`.
Matches the `config-default | programmatic-config | unknown`
set introduced earlier in Phase 6.5. Doc-comment for the
function updated to describe the new shape.
Workspace tests: 9736/9736 pass, 196 skipped.
…tion → Option<SourceInfo>
Plan 7f Phase 6.5 — eliminates the last residual `SourceInfo::default()`
sites in quarto-yaml-validation. The variant's location field is now
`Option<SourceInfo>`, distinguishing two semantically distinct cases:
- **`Some(...)`** — error arose while validating user-supplied YAML
against a schema. ~33 call sites in
`schema/{helpers,parser,parsers/*}.rs` already pass a real
`value.source_info.clone()` / `item.source_info.clone()` from the
parsed YAML node; each wrapped in `Some(...)`.
- **`None`** — error describes a bug in the schema *definition*
itself (no user-YAML to point at). 4 sites:
`schema/merge.rs:32, 51, 88` and `schema/mod.rs:250` previously
passed `quarto_yaml::SourceInfo::default()` as a placeholder.
Formatter (`error.rs:33-46`) now branches on `Option`: present →
`"… (at offset N)"`, absent → no span suffix.
Test pattern-matching at all destructure sites uses `{ message, .. }`
so no test code needed updating. Added a regression test
`test_schema_error_invalid_structure_display_no_location` for the
new None branch.
Compiler walked through 37 mismatched-types errors across 7 files
and the `Some(...)`-wrap is mechanical at every call site (the
right answer is what `rustc --explain E0308` literally suggests).
Workspace tests: 9737/9737 pass, 196 skipped (one new test).
…t SourceInfo Plan 7f Phase 6.5 — eliminates the empty-AttrSourceInfo sentinel that was the last `SourceInfo::default()` site in `quarto-pandoc-types`. `InlineAttr::new` is now a three-argument constructor that requires the caller to supply a real `source_info`. A `new_from_attr_source` convenience preserves the "derive from non-empty AttrSourceInfo" path for the two test sites that legitimately want it. Producer-side: widened the `PandocNativeIntermediate::IntermediateAttr` enum variant from `(Attr, AttrSourceInfo)` to `(Attr, AttrSourceInfo, SourceInfo)`, paying the source_info acquisition once at the producer instead of three times at each consumer. All three production consumers (`treesitter.rs:558`, `treesitter_utils/caption.rs:35`, `treesitter_utils/paragraph.rs:27`) now destructure the third field and pass it straight through to `InlineAttr::new`. Producer constructors that emit `IntermediateAttr`: - `treesitter.rs:1166` (commonmark_specifier) — passes `node_source_info_with_context(node, context)`. - `treesitter.rs:1183` (unnumbered_specifier) — same. - `treesitter.rs:1202` (attribute_specifier empty fallback) — same. - `treesitter_utils/commonmark_attribute.rs:58` — gained a `span` parameter; callers supply it from their local tree-sitter node. - `treesitter_utils/info_string.rs:30` — re-uses the language-source range (no separate parent span available). - `treesitter_utils/language_specifier.rs:116` — uses `node_source_info_with_context(node, context)` over the language_specifier node. - `treesitter_utils/language_specifier.rs:161` — dead-code fallback in `process_nested_language_specifier`, updated for consistency. Eight consumer destructure sites updated to ignore the new third field with `, _` (atx_heading, code_span_helpers, editorial_marks, fenced_code_block ×2, fenced_div_block, span_link_helpers ×2). None of these production consumers currently uses the intermediate's source_info — they take their span from the parent tree-sitter node directly. Test-code call sites (`InlineAttr::new(empty_attr(), AttrSourceInfo::empty(), …)`) — six sites in `filters.rs`, `writers/plaintext.rs`, `lua/types.rs`, `lua/filter.rs` — pass `SourceInfo::for_test()` as the third argument. Two test sites in `inline.rs` that exercise the `AttrSourceInfo` → `source_info` derivation moved to the `new_from_attr_source` convenience method. Deletes the obsolete `source_info_attr_empty` test (the case it asserted — empty AttrSourceInfo + InlineAttr::new fallback to `SourceInfo::default()` — is now structurally impossible). Doc-comment for `AttrSourceInfo` at `attr.rs:44-46` updated: the old "fall back to `SourceInfo::default()`" recipe no longer matches reality (theorem.rs and proof.rs fall back to `None` already). Workspace tests: 9736/9736 pass, 196 skipped.
Phase 6 (test audit) and Phase 6.5 (production residue sweep — enumerated sites) are complete; full `cargo xtask verify` passes all 12 steps including the WASM build leg. Plan file checkboxes updated and a new "Discovered production residue" section catalogues the ~70 unplanned `SourceInfo::default()` sites the Phase 6 sweep surfaced. Per the plan's "-D deprecated strategy", these are deferred to Phase 7's compiler-driven audit.
Plan 7f Phase 6.5 extension — apply explicit `By::*` kinds to the
~25 pampa production sites the original plan didn't enumerate.
New `By::citeproc()` constructor (atomic, non-sentinel): citeproc-
rendered content (citation Str replacements, bibliography `Div`s,
`#refs` wrappers) generated by CSL processing. Atomic — the user
edits citation styles via CSL, not through the preview's inline
editing surface. Added to `is_atomic_kind()`'s match arm.
Per-site fixes:
- `pampa/src/template/config_merge.rs` (5 sites — lang default,
pagetitle, top-level template-defaults map) → `By::config_default()`.
Template defaults are the canonical "no value in user config, use
this fallback" semantic.
- `pampa/src/toc.rs:98, 190` (TocEntry → ConfigValue,
NavigationToc → ConfigValue) → `By::programmatic_config()`.
Programmatic derivation from in-memory TOC structures.
- `pampa/src/citeproc_filter.rs` (3 sites — citation Str, bib
entry Div, refs Div wrapper) → `By::citeproc()`.
- `pampa/src/pandoc/meta.rs:95, 97, 231` (yaml-markdown-syntax-error
recovery Span + yaml-tagged-string Span) → reuse the caller's
`source_info` for both wrapper and inner Str so attribution points
at the offending YAML range. The wrapper has the same bytes as the
inner scalar — no new `By::` kind needed.
- `pampa/src/writers/json.rs:604, 625` (yaml-tagged-string Span
wrappers around Glob / Expr values) → same fix; reuse the value's
`source_info` for both wrapper and inner Str.
- `pampa/src/lua/types.rs` (8 sites — Lua-side inline construction
helpers), `pampa/src/lua/utils.rs` (10 sites — Lua block_to_inlines
LineBreak separators), `pampa/src/lua/readwrite.rs` (2 sites —
Lua → ConfigValue conversion) → `By::unknown()`. These are
Lua-side synthesis; the producer contract acknowledges that
`filter_source_info` may overwrite with `Generated{by:filter,...}`
on the way back out from a user filter.
Snapshot regenerated: `crates/pampa/snapshots/json/yaml-tags.snap`
(1 file). The diff is correct-behavior: yaml-tagged-string spans
now share the YAML source range with their inner content (previously
the wrapper had `Original{0,0,0}`), so the writer's pool intern
coalesces three references that used to be three default entries.
No semantic regression — fewer dead pool entries, identical
inline-level source tracking.
Remaining pampa residue (6 sites in `readers/json.rs`): all
explicitly allowed by `provenance-contract.md` §10 (legacy Pandoc
JSON backward-compat). They will need `#[allow(deprecated)]` when
Phase 7's deprecation lands.
Workspace tests: 9737/9737 pass, 196 skipped.
…quarto-config,quarto-citeproc): Phase 6.5 residue cleanup workspace-wide
Plan 7f Phase 6.5 extension — apply explicit `By::*` kinds across
the remaining ~70 production sites the original plan didn't
enumerate. Every non-test `SourceInfo::default()` in the workspace
now has a deliberate provenance kind; the only retained sites are
the 5 contract-allowed legacy-Pandoc-JSON sites in
`crates/pampa/src/readers/json.rs`.
New `By::*` kinds (3) added in `crates/quarto-source-map/src/source_info.rs`:
- `By::jupyter_output()` — atomic. Synthesized blocks/inlines from
kernel execution (Jupyter cell stdout / stderr, rich-display MIME
bundles, error tracebacks). Regenerated on every re-run, so the
preview's inline editor must not touch it.
- `By::callout()` — non-atomic. Wraps callout-decoration synthesis
(default-title injection, screen-reader-only type announcement);
the user's actual callout body stays editable through the preview.
Atomicity decision per the worked example in
`claude-notes/designs/provenance-contract.md` §3.
- `By::citeproc()` was added earlier in this phase and is reused
here for `quarto-citeproc/src/output.rs:1274`.
Per-site application:
- `quarto-citeproc/src/output.rs:1274` (`empty_source_info()` helper)
→ `By::citeproc()`.
- `quarto-core/src/engine/context.rs:92` (`ExecutionContext::new`)
→ `By::unknown()`. Matching assertion in
`engine_execution.rs:1378` updated.
- `quarto-core/src/engine/jupyter/{output.rs ×11, transform.rs ×1}`
→ `By::jupyter_output()`. Stream output, error tracebacks, MIME
bundle conversion (text/plain, text/html, text/markdown,
text/latex, image/* placeholders), and the inline-Code →
Inline-Str expression-result swap in `transform.rs:279`.
- `quarto-core/src/transforms/callout_resolve.rs` (3 sites) →
`By::callout()`. Default-title Str, screen-reader-only Span
wrapper, both child source_infos.
- `quarto-core/src/transforms/shortcode_resolve.rs` —
`config_value_to_inlines` (9 sites) + `lua_result_to_shortcode_result`
(1 site) + `flatten_blocks_to_inlines` (1 inter-paragraph
`Space` separator) reuse the surrounding `ConfigValue.source_info`
or the shortcode token's source range so the canonical stamper
pass downstream can wrap with the `Invocation` anchor. (The full
enrichment chain — `Generated{by: shortcode, from: [Invocation]}`
— happens at `stamp_block` / `stamp_inline`; this commit fixes
the *innermost* synthesis sites.)
- `quarto-core/src/transforms/sidebar_auto.rs` (4),
`categories_sidebar.rs` (3), `sidebar_render.rs` (2),
`sidebar_generate.rs` (1), `page_nav_render.rs` (1),
`navbar_render.rs` (1), `footer_render.rs` (1),
`toc_render.rs` (1), `listing_render.rs` (1),
`navigation_enrich.rs` (1) → `By::programmatic_config()`.
All synthesizing config-storage of rendered-HTML strings or
navigation items.
- `quarto-core/src/stage/stages/metadata_merge.rs` (4),
`listing_item_info.rs` (2), `math_js.rs` (1) →
`By::programmatic_config()`. Stage-processing intermediates
where source bytes don't exist.
- `quarto-core/src/project/listing/feed/{stage.rs, complete.rs}`,
`listing/post_render_upgrade/substitute.rs` — five diagnostic
builders → `By::unknown()`. Span-less diagnostics degrade
gracefully through the existing `with_location` formatter.
- `quarto-core/src/project/listing/config.rs:113`
(`Listing::default().categories_source`) →
`By::programmatic_config()`. Doc comment updated.
- `quarto-config/src/materialize.rs` (3 sites: `key_source`,
`MergedValue::Map` source_info fallback, missing-path
`ConfigValue::null`) → `By::programmatic_config()` /
`By::unknown()` per site.
- `quarto-analysis/src/transforms/shortcode.rs` (7 sites) — reuse
the shortcode token's source range; same pattern as the
canonical `shortcode_resolve.rs` enrichment, in the simpler
static-analysis form.
- `quarto-navigation/src/{page_nav,navbar,sidebar,footer,item}.rs`
(16 sites) → `By::programmatic_config()`. Navigation items
synthesized without YAML source context.
- `quarto-core/src/transforms/theorem.rs:312` doc-comment update
(the actual fall-back recipe is `None`, not `SourceInfo::default()`,
in the post-Phase-6.5 code).
Doc-comment-only references in
`shortcode_resolve.rs:172` and `navigation_href.rs:381` retained
as historical references — they describe pre-Phase-6.5 behavior.
Workspace tests: 9739/9739 pass, 196 skipped (3 new tests for the
new `By::*` kinds in `quarto-source-map`).
Updates CURRENT.md to reflect that the discovered production residue (~70 unplanned sites) was addressed inline rather than deferred to Phase 7. Three new `By::*` kinds were defined during the sweep: `By::citeproc()`, `By::jupyter_output()`, `By::callout()`. After this commit, only 6 production `SourceInfo::default()` callers remain — 5 contract-allowed legacy-Pandoc-JSON sites in `pampa/src/readers/json.rs` and the `impl Default for SourceInfo` body itself. Full `cargo xtask verify` passes all 12 steps including WASM/SPA. Phase 7's compiler-driven audit now has a much smaller surface to cover — most of the heavy lifting moved into Phase 6.5.
c9dcc6f to
88ef2ad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft PR for CI.
Provenance epic plans 3-7 are complete; provenance data is flowing and impossible edits to atomic elements are both blocked on the front end and soft-dropped by the incremental writer.
Next up: